Threadpool Executor vs Processpool Executor
ThreadPool Executor vs ProcessPool Executor
If you just want to know the difference? Here it is ...
ProcessPoolExecutor
- Runs each of your workers in its own separate child process.
- Each separate child process has its own separate Global Interpreter Lock, so if the task you want to execute is CPU-bound, using 5 child processes can make things run almost 5x as fast.
- It is possible to share data between processes. However, to do this, the state must be serialized and transmitted between processes. This process is called inter-process communication. There are certain limitations on what data and state can be shared and this adds overhead to sharing data.
- TLDR - Sharing state between processes is harder and heavyweight.
- Create 10s of Workers, not 100s or 1000s of tasks.
ThreadPoolExecutor
- Runs each of your workers in separate threads within the main process.
- Mainly used for I/O bound operations like Read/Write or network requests or there might be global variables or data shared via function arguments. Sharing data between Threads is straightforward.
- Multiple threads within a
ThreadPoolExecutor
are subject to the Global Interpreter Lock. This lock uses synchronization to ensure that only one thread of execution can execute instructions at a time within a Python process which means only one thread can execute at a time. - Ability to create 10s to 1,000s workers, not really constrained.
Difference
If you just want to know the difference, you can stop reading here and but if you want to learn more in detail, read further!
Threadpool Executor
A Threadpool executor class provides a thread pool in executor. What is a thread pool?
It is a group of pre-initiated threads that are on standby and ready to be given work.
When a pool is handed a task, it takes a thread from the container, hands it a Task
it was given and invoke the execute()
method. Once the execution is complete, the thread
hands itself back to the pool, putting itself to sleep.
Each thread has these features:
- Belongs to a process
- Share the same memory as other threads in the same process (state and data)
Processpool Executor
A Processpool Executor class provides a process pool in Python. A process is an instance of a computer program. The pool is responsible for
- When the pool is created, such as when they are needed
- Also what they should do when they are not being used, such as making them wait without consuming computational resources
When should we use Threadpool or Processpool?
TLDR - Processpool is for CPU bound task so you can benefit from multiple CPUs
Threadpool is for I/O bound task so you can benefit from the I/O wait. Do not use Threadpool for long running tasks such as monitoring or scheduling.
Notes
- https://stackoverflow.com/questions/51828790/what-is-the-difference-between-processpoolexecutor-and-threadpoolexecutor
- https://superfastpython.com/threadpoolexecutor-vs-processpoolexecutor/#:~:text=As%20their%20names%20suggest%2C%20the,by%20the%20underlying%20operating%20system.
- https://softwareengineering.stackexchange.com/questions/173575/what-is-a-thread-pool